devices/fs: fix FUSE_ALLOW_IDMAP regression on Linux 6.12+#584
Closed
JAORMX wants to merge 1 commit intocontainers:mainfrom
Closed
devices/fs: fix FUSE_ALLOW_IDMAP regression on Linux 6.12+#584JAORMX wants to merge 1 commit intocontainers:mainfrom
JAORMX wants to merge 1 commit intocontainers:mainfrom
Conversation
Linux 6.12 added a check in fs/fuse/inode.c:process_init_reply that
requires FUSE_POSIX_ACL to be present whenever FUSE_ALLOW_IDMAP is
advertised. Without it, the kernel sets conn_error = 1, silently
degrading the virtiofs connection. In this degraded state, reading and
modifying existing files still works, but creating new files or
directories fails with EPERM.
This breaks any setup where the host user's GID doesn't match the guest
user's GID (e.g., host 1000:1001 vs guest 1000:1000), because the
idmapped mount that would translate GIDs never gets established.
The root cause was that ALLOW_IDMAP was advertised from the server-global
"supported" flags in server.rs, which meant it got enabled regardless of
whether the backend could actually fulfill the POSIX_ACL contract that
the kernel now requires alongside it.
The fix moves ALLOW_IDMAP out of the server-global supported mask and
into each passthrough backend's init() return value, coupled with
POSIX_ACL. Now the negotiation looks like this:
1. If the kernel is capable of POSIX_ACL and the backend has xattr
support enabled, advertise POSIX_ACL.
2. Only then, if the kernel is also capable of ALLOW_IDMAP, advertise
ALLOW_IDMAP alongside it.
This works for passthrough filesystems because the POSIX_ACL contract
(mode-ACL synchronization, default ACL inheritance, ACL xattr storage)
is fulfilled by the host kernel. The passthrough already forwards all
xattr operations, including system.posix_acl_access and
system.posix_acl_default, directly to the host via libc calls with no
namespace filtering.
Note that on kernels 6.2+, the FUSE VFS layer has fuse_get_acl() and
fuse_set_acl() inode operations that handle idmapped UID/GID translation
within ACL entries, so the passthrough server doesn't need to interpret
ACL binary data itself.
Both the Linux and macOS passthrough backends are updated with the same
logic so we don't regress macOS-hosted Linux guests.
Fixes: containers#568
Signed-off-by: Juan Antonio Osorio <ozz@stacklok.com>
0253b8b to
951b4a4
Compare
Collaborator
|
The patch itself is probably right, but I have the same problem I had with #568, I can't reproduce it myself nor figure out in which context it can reproduced. In both 6.12.x and master, virtio-fs sets Do you know where |
Author
|
I was going down the rabbit hole, debugging a permissions issue I have with a project that leverages libkrun and I stumbled upon this. I did the PR, went for lunch, and afterwards was struggling to test it out.... turns out it doesn't actually solve the issue. So, yeah, I'll abandon this. I thought I had a reproducer but it turns out I don't. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Linux 6.12 added a check in
fs/fuse/inode.c:process_init_replythat requiresFUSE_POSIX_ACLto be present wheneverFUSE_ALLOW_IDMAPis advertised. Without it, the kernel setsconn_error = 1, silently degrading the virtiofs connection. In this degraded state, reading and modifying existing files still works, but creating new files or directories fails withEPERM.This breaks any setup where the host user's GID doesn't match the guest user's GID (e.g., host
1000:1001vs guest1000:1000), because the idmapped mount that would translate GIDs never gets established.The problem
ALLOW_IDMAPwas advertised from the server-globalsupportedflags inserver.rs, which meant it got enabled regardless of whether the backend could actually fulfill thePOSIX_ACLcontract that the kernel now requires alongside it.What this does
Moves
ALLOW_IDMAPout of the server-global supported mask and into each passthrough backend'sinit()return value, coupled withPOSIX_ACL:POSIX_ACLand the backend has xattr support enabled, advertisePOSIX_ACL.ALLOW_IDMAP, advertiseALLOW_IDMAPalongside it.This works for passthrough filesystems because the
POSIX_ACLcontract (mode-ACL synchronization, default ACL inheritance, ACL xattr storage) is fulfilled by the host kernel. The passthrough already forwards all xattr operations, includingsystem.posix_acl_accessandsystem.posix_acl_default, directly to the host via libc calls with no namespace filtering.On kernels 6.2+, the FUSE VFS layer has
fuse_get_acl()/fuse_set_acl()inode operations that handle idmapped UID/GID translation within ACL entries, so the passthrough server doesn't need to interpret ACL binary data itself.Both the Linux and macOS passthrough backends are updated with the same logic so we don't regress macOS-hosted Linux guests.
Fixes #568